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1 OHI^EGA: a VLSI superscalar processor architecture for numerical 84% 
1^ applications 

Masaitsu Nakajima , Hiraku Nakano , Yasuhiro Nakakura , Tadahiro Yoshida , Yoshiyuki Goi , Yuji 
Nakai , Reiji Segawa , Takeshi Kishida , Hiroshi Kadota 

ACM SIGARCH Computer Architecture News , Proceedings of the iSth annual 
international symposium on Computer architecture April 1991 
Volume 19 Issue 3 



2 An evaluation of branch architectures 80% 



Proceedings of the 14th annual international symposium on Computer architecture 

June 1987 

Branch instructions form a significant fraction of executed instructions, and their design is 
thus a crucial component of any architecture. This paper examines three alternatives in the 
design of branch instructions: delayed vs. non-delayed branches, one- vs. two-instruction 
branches, and the use or non-use of condition codes. Simulation and analytical techniques 
are used to provide quantitative comparisons between these choices. 

3 Software and hardware parallelisnn on the iWarp multi-connputer 80% 

Cfe Herbert G. Mayer , Brent Baxter 

Proceedings of the 5th international conference on Supercomputing June 1991 

4 Superscalar architectures: Select-free instruction scheduling logic 77% 



Mary D. Brown , Jared Stark , Yale N. Patt 

Proceedings of the 34th annual ACM/IEEE international symposium on 

Microarchitecture December 2001 

Pipelining allows processors to exploit parallelism. Unfortunately, critical loops— pieces of 
logic that must evaluate in a single cycle to meet IPC (Instructions Per Cycle) goals- 
prevent deeper pipelining. In today's processors, one of these loops is the instruction 
scheduling (wakeup and select) logic [10]. This paper describes a technique that pipelines 
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this loop by breaking it into two smaller loops: a critical, single-cycle loop for wakeup; and a 
non-critical, potentially multi-cycle, lo ... 



5 Discovering machine-specific code innprovennents 77% 

PJ^ Peter B. Kessler 

^ ACM SIGPLAN Notices , Proceedings of the 1986 SIGPLAN symposium on Compiler 
contraction July 1986 
Volume 21 Issue 7 

I have designed and built a compiler construction tool that automates much of the case 
analysis necessary to exploit special purpose instructions on a target machine. Given a 
suitable description of the target machine, my analysis identifies instruction sequences that 
are equivalent to single instructions. During code generation, these equivalences can be 
used to avoid inefficient instruction sequences in favor of more efficient instructions.! 
present a working prototype of the i ... 



6 Compiler sclieduling: Reduced code size modulo scheduling in the absence of 77% 
m hardware support 

Josep Llosa , Stefan M. Freudenberger 

Proceedings of the 35th annual ACM/IEEE international symposium on 

Microarchitecture November 2002 

Modulo scheduling Is a very effective instruction scheduling technique that exploits 
Instruction Level Parallelism (ILP) in loop bodies by overlapping the execution of successive 
iterations. Unfortunately, modulo scheduling has been shown to cause heavy code 
expansion. To avoid the penalties of code expansion, some processors have dedicated 
hardware support for modulo scheduled loops. However, this dedicated hardware support 
has a cost in chip area, cycle time, processor complexity, and compiler ... 



7 A cellular general purpose computer 77% 
Pft R. G. Cornell , H. C. Torng 

— ACM SIGARCH Computer Architecture News , Proceedings of the 2nd annual 
symposium on Computer architecture December 1974 
Volume 3 Issue 4 

A 2-dimensional cellular general-purpose computer is specified. This particular cellular 
computer is distinguished from previously proposed, locally-controlled cellular computers In 
that the cellular structure is "hidden" from the user. At the ISP level, the machine is similar 
to a small-scale computer of the von Neumann type. However, the architecture of the 
computer does not feature physically Isolated functional units to implement memory, 
processor, or control. As a result, we present a machi ... 



8 Application specific compiler/architecture codesign: a case study 77% 

Oliver Wahlen , Tilman Glokler , Achim Nohl , Andreas Hoffmann , Rainer Leupers , Heinrlch 
^ Meyr 

ACM SIGPLAN Notices , Proceedings of the joint conference on Languages, compilers 
and tools for embedded systems: software and compilers for embedded systems June 
2002 

Volume 37 Issue 7 

This paper proposes an architecture exploration methodology for application specific 
instruction set processors (ASIPs) including a C compiler and a VHDL model in the 
exploration loop. For a given application the target architecture is an instance of the scalable 
ALICE VUW architecture which will be presented in this paper. In a case study it will be 
explained how the LISA processor design platform in conjunction with the CoSy compiler 
environment significantly reduces the time for exploration ... 
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9 Energy aware compilation for DSPs with SII^D instructions 77% 
□h Markus Lorenz , Lars Wehmeyer , Thorsten Drager 

^ ACM SIGPLAN Notices , Proceedings of the joint conference on Languages, compilers 
and tools for embedded systems: software and compilers for embedded systems June 
2002 

Volume 37 Issue 7 

The growing use of digital signal processors (DSPs) in embedded systems necessitates tfie 
use of optimizing compilers supporting special hardware features. In this paper we present 
compiler optimizations with the aim of minimizing energy consumption of embedded 
applications: This comprises loop optimizations for exploitation of SIMD instructions and zero 
overhead hardware loops in order to Increase performance and decrease the energy 
consumption. In addition, we use a phase coupled code generator ... 

10 Flow-control machines: the structured execution architecture (SXA) 77% 
J. M. Terry 

ACM SIGARCH Computer Architecture News September 1987 
Volume 15 Issue 4 

Can the looping, branching, sequencing, and modularity of structured programming be 
implemented effectively by single instructions at the machine-instruction level? A proposal 
for a class of machines to do so is presented. Flow control verbs such as REPEAT and 
IF/THEN/ELSE are represented in large-format instructions containing pointers to conditional 
controls and other instructions. An active-instruction stack Is used to nest flow structures. 
Independent buses for logically distinct memory spac ... 

11 Algorithm and architecture of a IV low power hearing instrument DSP 77% 
Finn Moller , Nikolai Bisgaard , John Melanson 

Proceedings of the 1999 international symposium on Low power electronics and 
design August 1999 

12 Vector architectures: past, present and future 77% 

Roger Espasa , Mateo Valero , James E. Snnith 

Proceedings of the 12th international conference on Supercomputing July 1998 

13 Speculative multithreaded processors 77% 
Pedro Marcuello , Antonio Gonzalez , Jordi Tubella 

Proceedings of the 12th international conference on Supercomputing July 1998 

14 A compilation technique for software pipelining of loops with conditional 77% 
jumps 
Kennal Ebcioglu 

Proceedings of the 20th annual workshop on Microprogramming December 1987 

We describe a compilation algorithm for efficient software pipelining of general inner loops, 
where the number of iterations and the time taken by each iteration may be unpredictable, 
due to arbitrary if-then- else statements and conditional exit statements within the loop. As 
our target machine, we assume a wide instruction word architecture that allows multi-way 
branching in the form of if-then-else trees, and that allows conditional register transfers 
depending on where the microinstruct ... 

15 Techniques for extracting instruction level parallelism on MIMD architectures 77% 
LJ^ Gary Tyson , Matthew Farrens 

Proceedings of the 26th annual international symposium on Microarchitecture 
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December 1993 



16 Implementing signatures for C++ 77% 

Gerald Baumgartner , Vincent F. Russo 
— ACM Transactions on Programming Languages and Systems (TOPLAS) January 1997 

Volume 19 Issue 1 

We outline the design and detail the implementation of a language extension for abstracting 
types and for decoupling subtyping and inheritance in C++. This extension gives the user 
more of the flexibility of dynamic typing while retaining the efficiency and security of static 
typing. After a brief discussion of syntax and semantics of this language extension and 
examples of its use, we present and analyze three different implementation techniques: a 
preprocessor to a C++ compiler, an implem ... 



17 Compiler transformations for high-performance computing 77% 
David F. Bacon , Susan L. Graham , Oliver J. Sharp 
ACM Computing Surveys (CSUR) December 1994 
Volume 26 Issue 4 

In the last three decades a large number of compiler transformations for optimizing 
programs have been implemented. Most optimizations for uniprocessors reduce the number 
of Instructions executed by the program using transformations based on the analysis of 
scalar quantities and data-flow techniques. In contrast, optimizations for high-performance 
superscalar, vector, and parallel processors maximize parallelism and memory locality with 
transformations that rely on tracking the properties o ... 
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Frontiers in Education Conference, 1998. FIE '98. 28th Annual , Volume: 3 , 4-7 
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